An Empirical Comparison of Contemporary Unsupervised Approaches for Extractive Speech Summarization
نویسندگان
چکیده
Due to the rapid-developed Internet and with the big data era coming, the automatic summarization research has been emerged a popular research topic. The aim of automatic summarization is in attempt to select important text or spoken sentence to represent the topic (theme) of original text or spoken document according to a predefined summarization ratio. In this study we frame automatic summarizaiton task as an ad-hoc information retrieval (IR) problem and employ the mathematical sound language modeling (LM) framework for extractive speech summarization, which can perform important sentence selection in an unsupervised manner and has shown its preliminary success. The main contribution of this paper is three-fold. First, by the virtue of relevance modeling, we explore several effective sentence modeling formulations to enhance the sentence models involved in the LM-based summarization framework and the first use of tri-mixture model to improve the performance of extractive speech summarization. Second, since the language modeling will suffer from data sparseness problem and the common solution is to adopt smoothing techniques, in this research we investigate three different smoothing approaches to evaluate how they influence the summarization performance. Third, we further apply the well-studied ranking model (BM25) and also its variants in IR community for ranking important sentence in extractive speech summarization. Experiments conducted on public avaiable dataset (MATBN) and the results show that our applied methods have effective summarization performance when compared to the other well-practiced and state-of-the-art unsupervised methods.
منابع مشابه
A Risk Minimization Framework for Extractive Speech Summarization
In this paper, we formulate extractive summarization as a risk minimization problem and propose a unified probabilistic framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. In addition, the introduction of various loss functions also provides the summarization framework with a fle...
متن کاملIntonational phrases for speech summarization
Extractive speech summarization approaches select relevant segments of spoken documents and concatenate them to generate a summary. The extraction unit chosen, whether a sentence, syntactic constituent, or other segment, has a significant impact on the overall quality and fluency of the summary. Even though sentences tend to be the choice of most the extractive speech summarizers, in this paper...
متن کاملPositional language modeling for extractive broadcast news speech summarization
Extractive summarization, with the intention of automatically selecting a set of representative sentences from a text (or spoken) document so as to concisely express the most important theme of the document, has been an active area of experimentation and development. A recent trend of research is to employ the language modeling (LM) approach for important sentence selection, which has proven to...
متن کاملLong story short - Global unsupervised models for keyphrase based meeting summarization
We analyze and compare two different methods for unsupervised extractive spontaneous speech summarization in the meeting domain. Based on utterance comparison, we introduce an optimal formulation for the widely used greedy maximum marginal relevance (MMR) algorithm. Following the idea that information is spread over the utterances in form of concepts, we describe a system which finds an optimal...
متن کاملHybrids of supervised and unsupervised models for extractive speech summarization
Speech summarization, distilling important information and removing redundant and incorrect information from spoken documents, has become an active area of intensive research in the recent past. In this paper, we consider hybrids of supervised and unsupervised models for extractive speech summarization. Moreover, we investigate the use of the unsupervised summarizer to improve the performance o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 22 شماره
صفحات -
تاریخ انتشار 2017